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Abstract 

The accuracy of Gibbs sampling, a Markov chain Monte Carlo procedure, was considered for 
estimation of item and ability parameters under the two-parameter logistic model. Memory 
test data were analyzed to illustrate the Gibbs sampling procedure. Simulated data sets were 
analyzed using Gibbs sampling and the marginal Bayesian method. The marginal Bayesian 
method combined with the expected a posteriori estimation of ability yielded consistently 
smaller root mean square errors and better bias results than Gibbs sampling. 



Keywords: Bayesian inference, Gibbs sampling, item response theory, Markov chain Monte 
Carlo, marginal Bayesian. 



Introduction 



For models with several parameters, statistical inference sometimes requires integration over 
high-dimensional probability distributions in order to estimate any parameter of interest or 
to obtain any particular function of the parameters. One such case is estimation of item 
and ability parameters in the context of item response theory (IRT). Except for certain 
rather simple problems with highly structured frameworks (e.g., an exponential family 
together with conjugate priors in the Bayesian approach), the required integrations may be 
analytically nontractable. As is true for many cases in statistics, the marginal density can 
be approximated using various techniques (e.g., standard numerical integration, Laplacian 
approximation, Edgeworth expansion, importance sampling, Metropolis algorithm; see 
Bernardo & Smith, 1994; Leonard & Hsu, 1994). In this paper, we examine the accuracy 
of Gibbs sampling, one of the Markov Chain Monte Carlo (MCMC) methods for marginal 
density estimation, for estimation of IRT parameters. In particular, we focus on the accuracy 
of Gibbs sampling (Geman & Geman, 1984) for estimation of item and ability parameters 
under the two-parameter logistic (2PL) model when sample sizes are small. 

A number of ways exist for implementing the MCMC method. [For a review, refer 
to Bernardo and Smith (1994), Carlin and Louis (1996), and Gelman, Carlin, Stern, and 
Rubin (1995).] Metropolis and Ulam (1949), Metropolis, Rosenbluth, Rosenbluth, Teller, and 
Teller (1953), and Hasting (1970) present a general framework within which Gibbs sampling 
(Geman &; Geman, 1984) can be considered as a special case. In this regard, Gelfand 
and Smith (1990) discuss several different Monte Carlo-based approaches, including Gibbs 
sampling, for calculating marginal densities. [See Gilks, Richardson, and Spiegelhalter (1996) 
for a recent survey of applications.] Basically Gibbs sampling is applicable for obtaining 
parameter estimates for the complicated joint posterior distribution in Bayesian estimation 
under IRT (e.g., Mislevy, 1986; Swaminathan & Gifford, 1985; Tsutakawa &; Lin, 1986). 

A few studies have examined the use of Gibbs sampling under IRT. Albert (1992) 
applied Gibbs sampling in the context of IRT to estimate item parameters for the two- 
parameter normal ogive model and compared these estimates with those obtained using 
maximum likelihood estimation. Baker (1998) has also investigated item parameter recovery 
characteristics of Albert’s Gibbs sampling method for item parameter estimation via a 
simulation study. Patz and Junker (1997) developed a MCMC method based on the 
Metropolis-Hasting algorithm and presented an illustration using the 2PL model. 
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MCMC computer programs in the context of IRT have been developed largely only 
for specific applications. For example, Albert (1992) used a computer program written in 
MATLAB (The MathWorks, Inc., 1996). Baker (1998) developed a specialized FORTRAN 
version of Albert’s Gibbs sampling program to estimate item parameters of the two parameter 
normal ogive model. Patz and Junker (1997) developed an S-PLUS code (MathSoft, 
Inc., 1995). Spiegelhalter, Thomas, Best, and Gilks (1997) have also developed a general 
Gibbs sampling computer program BUGS for Bayesian estimation, using the adaptive 
rejection sampling algorithm (Gilks & Wild, 1992). The computer program BUGS requires 

specification of the complete conditional distributions. 

The marginal maximum likelihood (MML) and marginal Bayesian (MB) methods using 
the expectation and maximization (EM) algorithm, as implemented in the computer program 
BILOG (Mislevy & Bock, 1990), have become the standard estimation technique for 
obtaining item parameter estimates of IRT. Ability parameters are estimated in those 
marginalized solutions using either maximum likelihood (ML), expected a posteriori (EAP), 
or maximum a posteriori (MAP) estimation after obtaining the item parameter estimates 
and assuming the estimates are true values. The Gibbs sampling procedure approaches the 
estimation of item parameters using the joint posterior distribution rather than the marginal 
distribution. In Gibbs sampling ability parameters can be estimated either jointly with item 
parameters or after obtaining the item parameters. All of the estimation methods should 
yield comparable item and ability parameter estimates, when comparable priors are used or 
when ignorance or locally uniform priors are used when sample sizes are large. This study 
was designed to evaluate the comparability of item and ability parameter estimates using the 
2PL model. Specifically, estimation methods implemented in the two computer programs, 
BUGS and BILOG, were examined and compared. 

Theoretical Framework 



Marginalized Solutions 

Consider binary responses to a test with n items by each of N examinees. A response of 
examinee i to item j is represented by a random variable Yij, where i = 1(1) N and j - l(l)n. 
The probability of a correct response of examinee i to item j is given by P(Yij = 1|0*. £;) = p ij 
and the probability of an incorrect response is given by PiXij = O|0i,£j) = 1 - Pij - Qij, 
where 0 { is ability and $ is the vector of item parameters. 
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For examinee «, there is an observed vector of dichotomously scored item responses 
of length n, Y { = (Yii, . . . , **»)'• Under the assumption of conditional independence, the 
probability of Y { given 0 { and the vector of all item parameters, £ = (£ 1 , • • -An)', is 



p{Yi\e i ,e = U p ij ii Qii 

3 = 1 



1 -Yu 



( 1 ) 



The marginal probability of obtaining the response vector Y { for examinee i sampled from a 
given population is 

p (Yi\Z) = J p{Yi\0i,S)p{9i)de it ( 2 ) 

where pffl is the population distribution of 9 { . Without loss of generality, we can assume 
that the B { are independent and identically distributed as standard normal, 6 { ~ N{ 0, 1). This 
assumption may be relaxed as the ability distribution can also be empirically characterized 
(Bock & Aitkin, 1981). The marginal probability of Yi can be approximated with any 
specified degree of precision by Gaussian quadrature formulas (Stroud & Secrest, 1966). 

The marginal probability of obtaining the N x n response matrix Y is given by 

p(Y\Z) = l[pm) = mY), ( 3 ) 

i=l 

where Z(£|T) can be regarded as a function of £ given the data Y. In MML, the marginal 
likelihood is maximized to obtain maximum likelihood estimates of item parameters (Bock 
& Aitkin, 1981; Bock & Lieberman, 1970). 

Bayes’ theorem tells us that the marginal posterior probability distribution for £ given 
the data, Y, is proportional to the product of the marginal likelihood for £ given Y and the 
prior distribution of £. That is, 

p«|r) = oc jkptpK), (4) 

where oc denotes proportionality. The marginal likelihood function represents the informa- 
tion obtained about £ from the data. In this way, the data modify our prior knowledge 
of £. A prior distribution represents what is known about unknown parameters before the 
data are obtained. Prior knowledge or even relative ignorance can be represented by such a 
distribution. In MB estimation of item parameters, the marginal posterior is maximized to 
obtain Bayes modal estimates of item parameters (see Mislevy, 1986). 

Point estimates of ability parameters do not arise during the course of the marginalized 
estimation of item parameters. They are calculated after the item parameters are estimated 
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assuming the obtained item parameters are true values. Three methods are generally 
available; ML, EAP (i.e., posterior mean), and MAP (i.e., posterior mode) (Bock k Aitkin, 

1981; Bock k Mislevy, 1982). 

Joint Estimation Procedures 

Birnbaum (1968) and Lord (1980) describe the estimation of the 9 and f by joint 
maximization of the likelihood function 

p(mo = n ft Pi(.<>i) Y “QM)'- Y “ = w.ein < 5 > 

i=l j=l 

where 9 = (0j, . . . ,9 N )'. In implementation of joint maximum likelihood (JML) estimation 
(see Lord, 1986 for a comparison of marginalized and joint estimation methods), the item 
parameter estimation part for maximizing 1{£\Y, 9) and the ability parameter estimation part 
for maximizing 1{9\Y,£) are iterated until a stable set of maximum likelihood estimates of 
item and ability parameters are obtained. 

Extending the idea of joint maximization, Swaminathan and Gifford (1982, 1985, 1986) 
suggested that 9 and f can be estimated by joint maximization with respect to the parameters 
of the posterior density 

p(fl,;|y) = p(y| p ( y ~ K (6) 

where p{9 , £) is the prior density of the parameters 9 and £. This procedure is joint Bayesian 
(JB) estimation. Under the assumption that priors of 9 and £ are independently distributed 
with probability density functions p{9) and p{ 0, the item parameter estimation part 
maximizing l{£\Y,9)p(£), and the ability parameter estimation part maximizing l{9\Y,£)p{9) 
are iterated to obtain stable Bayes modal estimates of item and ability parameters. 

Gibbs Sampling 

The main feature of MCMC methods is to obtain a sample of parameter values from the 
posterior density (Tanner, 1996). The sample of parameter values then can be used to 
estimate some functions or moments (e.g., mean and variance) of the posterior density of 
the parameter of interest. In the IRT estimation procedures via MML, MB, JML, or JB noted 
above, however, the task is to obtain modes of the likelihood function or of the posterior 
distribution. 
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The Gibbs sampling algorithm is as follows (Gelfand & Smith, 1990; Tanner, 1996). 
First, instead of using 6 and £, let a; be a vector of parameters with k elements. Suppose 
that the full or complete conditional distributions, p{b)i\u it Y), where i = l{l)k and j ± », 
are available for sampling. That is, samples may be generated by some method given values 
of the appropriate conditioning random variables. Then given an arbitrary set of starting 
values, u4 0) , . . . ,4 0) , the algorithm proceeds as follows: 

Draw from p(wi|u4°\ • • • 

Draw u;^ from p(uJ 2 |^i \ ^4 ^ > • • • > ^ ’ y ) > 

Draw u;^ from Y), 

Draw from p(u;i|u4 \ • • • >^4 \ Y), 

Draw from p{u> 2 l^i \ w 3 \ • • • >^4 Y)> 

Draw ujjp from p(uifc|u;i 2 \ . . . )k4-i> Y), 

Draw from p(u>i |^4 \ • • • > \ Y)> 

Draw u> 2 +1 ^ from p(u; 2 |k>j t+1 \ c*4 \ Y), 

Draw u4* +1 ? from p(c*>fc|k4 t+1 \ . . . ,k4_i\ Y), 



The vectors . . . ,o>^, . . . are a realization of a Markov chain with a transition probability 
from to o/ t+1) given by 

p(o,(‘v t+i >) = < w (7) 

/=i 

The joint distribution of a/W converges geometrically to the posterior distribution p(u\Y) 
as t -* 00 (Geman & Geman, 1984, Bernardo & Smith, 1994). In particular, wj* 1 tends to 
be distributed as a random quantity whose density is p{u>i\Y). Now suppose that there exist 
m replications of the t iterations. For large t , the replicates are approximately 

a random sample from p( Wi |Y). If we make m reasonably large, then an estimate, p(w<|Y), 
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(8) 



can be obtained either as a kernel density estimate derived from the replicates or as 

1 m 

p(wi|y) = — d ^ 

m i=i 

In the context of IRT, Gibbs sampling attempts to sample sets of parameters from the 
joint posterior density p(0,£|T). Inferences with regard to parameters can then be made 
using the sampled parameters. Note that inference for both 9 and £ can be made from the 
Gibbs sampling procedure. 

An Example 

Steps for Gibbs Sampling 

The following example is presented using the 10-item memory test data for 40 examinees from 
Thissen (1982) (see Table 1). Model parameters were estimated by Gibbs sampling using 
the computer program BUGS (Spiegelhalter et al., 1997). These same data were analyzed 
under the Rasch model in Thissen (1982). 



Insert Table 1 about here 

Gibbs sampling uses the following four basic steps (cf. Spiegelhalter, Best, et al., 1996). 

1. Full conditional distributions and sampling methods for unobserved parameters must 
be specified. 

2. Starting values must be provided. 

3. Output must be monitored. 

4. Summary statistics (e.g., estimates and standard errors) for quantities of interest must 
be calculated. 

Discussion of the four steps involved are presented in detail below. In addition, 
comparisons with the results from the marginalized methods (e.g., MB and MML) as 
implemented in the computer program BILOG (Mislevy & Bock, 1990) are presented. 



Model Specifications 

The model specifications are used as input to the BUGS computer program. In the memory 
test data set, the item responses Yy are independent, conditional on their parameters Py- 
For examinee i and item j, each Py is a function of the ability parameter 6 it the item 
discrimination parameter atj, and the item difficulty parameter fy under the 2PL. The 0» 
are assumed to be independently drawn from a standard normal distribution for scaling 
purposes. Figure 1 shows a directed acyclic graph (see Lauritzen, Dawid, Larsen, k Leimer, 
1990; Whittaker, 1990; Spiegelhalter, Dawid, Lauritzen, k Cowell, 1993) based on these 
assumptions. A, and Q are used in Figure 1 instead of oy and f3 j (see Equation 11). The 
model can be seen as directed because each link between nodes is represented as an arrow. 
The model can also be seen as acyclic because it is impossible to return to a node after leaving. 
It is only possible to proceed by following the directions of the arrows. Each variable or 
quantity in the model appears as a node in the graph, and directed links correspond to direct 
dependencies as specified above. The solid arrow denotes the probabilistic dependency, while 
dashed arrows indicate functional or deterministic relationships. The rectangle designates 
observed data, and circles represent unknown quantities. 



Insert Figure 1 about here 



We use the following definitions: Let v be a node in the graph, and V be the set of all 
nodes. A parent of v is defined as any node with an arrow extending from it and pointing to 
v. A descendant of v is defined as any node on a direct path beginning from v. For identifying 
parents and descendants, deterministic links should be combined so that, for example, the 
parent of Yy is Py . It is assumed in Figure 1 that, for any node v, if we know the value of 
its parents, then no other nodes would be informative concerning v except descendants of v. 

Lauritzen et al. (1990) indicated that, in a full probability model, the directed acyclic 
graph model is equivalent to assuming that the joint distribution of all the random quantities 
is fully specified in terms of the conditional distribution of each node given its parents. That 
is, 

p(v ) = n P(u|parents[u]), (9) 

v€V 

where P(-) denotes a probability distribution. This factorization not only allows extremely 
complex models to be built up from local components, but also provides an efficient basis 

0 8 

erJc 1 n 



for the implementation of MCMC methods (Spiegelhalter, Best, et al., 1996). 

Gibbs sampling via the BUGS computer program works by iteratively drawing samples 
from the full conditional distributions of unobserved nodes in Figure 1 using the adaptive 
rejection sampling algorithm (Gilks, 1996; Gilks k Wild, 1992). For any node t», the 
remaining nodes are denoted by V - v. It follows that the full conditional distribution, 

P(v\V - u), has the form 

P(v\V-v) oc P(v,V-v) 

oc P{v | parent [u]) n P(w|parents[u;]). (10) 

u/Gchildren[v] 

The proportionality constant, which is a function of the remaining nodes, ensures that the 

distribution is a probability function that integrates to unity. 

To analyze the memory test data, we begin by specifying the forms of the parent and child 
relationships in Figure 1. Under the 2PL model, the probability that examinee i responds 
correctly to item j is assumed to follow a logistic function parameterized by the examinee’s 
latent ability 0 it the item discrimination parameter, <*,, and the item difficulty parameter, fa. 
For estimation purposes, we use the form afaOi - fa) = faOi + Cj> where the slope parameter 
fa = Q!j and the intercept parameter fa = —ajfa. Hence, 

1 _ \ (\\) 

Pij = 1 + expl-c^i - fa)} ~ 1 + exp[-(Aj0j + fa )} ' 

Since Y^ are Bernoulli with parameter Py, we can define 

Yij ~ Bernoulli(Pij) (12) 

and 

logit(Pij) = A jdi + fa. (13) 

To complete the specification of a full probability model for the BUGS computer program, 
prior distributions of the nodes without parents (i.e., A j, and fa) also need to be specified. 
We can define these priors in several different ways. We can impose priors on fa and fa using 
a hierarchical Bayes approach (e.g., Swaminathan k Gifford, 1985; Kim, Cohen, Baker, 
Subkoviak, k Leonard, 1994) or, if it is preferred that the priors not be too influential, 
uninformative priors could be imposed. Alternatively, it may also be useful to include 
external information in the form of fairly informative prior distributions. According to 
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Spiegelhalter, Best, et al. (1996), it is important to avoid causal use of standard improper 
priors in MCMC modeling, since these may result in improper posterior distributions. 

Following Spiegelhalter, Thomas, et al. (1996), two prior distributions were chosen for 
the memory test analyses: (1) A j ~ N(0,1) with Xj > 0 and Q ~ iV(0, 100 2 ) and (2) 
„ ]V(0, 10 2 ) With Xj > 0 and C; ~ N{ 0, 100 2 ). An example input file for BUGS is given 

in the Appendix. 

Starting Values 

The choice of starting values (e.g., w<°>) is not generally that critical as the Gibbs sampler 
(and most other MCMC algorithms as well) should be run long enough to be sufficiently 
updated from its initial states. It is useful, however, to perform a number of runs using 
different starting values to verify that the final results are not sensitive to the choice of 
starting values (Gelman, 1996). Raftery (1996) indicated that extreme starting values could 

lead to a very long burn-in or stabilization process. 

In this example, three runs were performed using the memory test data with three sets 
of starting values for Xj and Q, j = 1(1)10. The starting values for the item parameters are 
given in Table 2. The first run started at values considered plausible in the light of the usual 
range of item parameters. The second run and the third represented substantial deviations 
in initial values. In particular, the second run was intended to represent a situation in which 
there was a possibility that items were highly discriminating, and the third run represented 
an opposite assumption. The priors used in the three runs were the same; Xj ~ N( 0, 1) with 
Xj > 0 and Cj ~ AT(0, 100 2 ). 



Insert Table 2 about here 



Each of the three runs consisted of 10,000 iterations. Results for A x and Ci are presented 
in Figure 2. The computer program CODA (Best, Cowles, & Vines, 1997) was used to obtain 
these graphs. The top two plots in Figure 2 contain the graphical summaries of the Gibbs 
sampler for X x . The top left plot shows the trace of the sampled values of A x for the three 
runs. Results for all three runs show that the Ai generated by the Gibbs sampler quickly 
settled down regardless of the starting values. The top right graph shows the kernel density 
plot of the three pooled runs of 30,000 values for A x . The variability among the A x values 
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generated by the Gibbs sampler seems to be large, possibly due to the small sample size 
The distribution looks like a truncated normal form due to the positive constraints on Xj. 



Insert Figure 2 about here 



The bottom two plots contain graphical summaries of the Gibbs sampler for Ci- The 
bottom left plot shows the trace of the sampled values of Ci for a11 three runs - The Ci 
generated by the Gibbs sampler quickly settled down regardless of the starting values. The 
bottom right graph shows the kernel density plot of the three pooled runs of 30,000 values 
for Ci- The variability of the Xi values seems to be large. The sampled values seem to be 
concentrated around -2, and the sample values seem to follow a normal distribution. 

The results for other item parameter estimates were very similar to those for Ai and Ci- 
Overall, the starting values appear to not have affected the final results. Useful starting 
values for IRT problems can be found from the noniterative minimum logit chi-square 
estimation solution (Baker, 1987) or from values based on Jensema (1976) and Urry (1974) 
as employed in BILOG. Use of “good” starting values, such as from the above methods, 
can avoid the time delay required by a lengthy starting period. Our experience with these 
starting values indicates Xj = 1 and Q = 0 will work sufficiently well for applications under 
the 2PL. In subsequent analyses, therefore, the values, Xj = 1 and Q = 0, were used as 
starting values. 

Output Monitoring 

A critical issue for MCMC methods including Gibbs sampling is how to determine when one 
can safely stop sampling and use the results to estimate characteristics of the distributions 
of the parameters of interest. In this regard, the values for the unknown quantities generated 
by the Gibbs sampler can be graphically and statistically summarized to check mixing and 
convergence. The method proposed by Gelman and Rubin (1992) is one of the most popular 
for monitoring Gibbs sampling. [Cowles and Carlin (1996) presented a comparative review 

of convergence diagnostics for MCMC algorithms.] 

We illustrate here the use of Gelman and Rubin (1992) statistics on three 10,000 iteration 
runs. Details of the Gelman and Rubin method are given by Gelman (1996). Each 10,000 
iteration run required about 10 minutes on a Pentium 90 megahertz computer. Monitoring 
was done using the suite of S-functions called CODA (Best et al„ 1997). Figure 3a shows 



the trace lines of the sampled values of Aj and Cl for the two runs. The plots in Figure 
3a indicate that the three runs yielded similar values. Gelman-Rubin statistics (i.e., shrink 
factors) are plotted in Figure 3b for A x and Ci- For both parameters, the medians were 
stabilized after roughly 500 iterations and definitely after about 5,000 iterations. 



Insert Figures 3a and 3b about here 



For each parameter, the Gelman-Rubin statistics estimate the reduction in the pooled 
estimate of variance if the runs were continued indefinitely. The Gelman-Rubin statistics 
should be near 1 in order to be reasonably assured that convergence has occurred. The 
median for A x in the example was 1.00 and the 97.5 percentage point was 1.00. The median 
for Ci was 1.00 and the 97.5 percentage point was 1.00. These values indicated that reasonable 

convergence was realized for these parameters. 

The Gelman-Rubin statistics can be calculated sequentially as the runs proceed, and 
plotted as in Figure 3b. These plots as well as other plots for A, and Q suggest the first 

1.000 iterations of each run be discarded and the remaining samples be pooled. We used 

5.000 iterations as burn-in and the subsequent 5,000 iterations for estimating. 

BUGS and BILOG Parameter Estimates 

The posterior mean of the Gibbs sampler was obtained for each parameter. Two different 
sets of prior distributions for item parameters were employed in the BUGS runs. The 
first set employed an informative prior on A ; ~ iV(0, 1) and an uninformative prior on 
Cj ~ N(0, 100 2 ). In addition, a constraint was imposed on the ranges of Xj to allow 
only positive values (i.e., A j > 0). The prior distribution for Xj limits possible values. 
Gibbs sampling-informative (GS-I) indicates this informative prior for Xj. The second set 
employed two uninformative prior distributions, Xj ~ N( 0, 10 2 ) with the constraint Xj > 0 
and Cj ~ N(0, 100 2 ). This second set of priors is Gibbs sampling-uninformative (GS-U). 

For BILOG runs, two procedures were used: MB/EAP (i.e, marginal Bayesian item 
parameter estimation with expected a posteriori ability estimation) and MML/ML (i.e, 
marginal maximum likelihood item parameter estimation with maximum likelihood ability 
estimation). The default prior in BILOG for the estimation of item parameters in the 2PL 
is only on the item discrimination parameter as p(logo:jf) = Af(/zi ogQj . ,<7]^^.) — -^(0> )• 
Default options of BILOG yield MB/EAP. For MML/ML, no prior distributions were used 



(although, technically speaking, the marginalization required the standard normal prior for 
ability). 

Insert Tables 3 and 4 about here 



The information in Table 3 indicates that the four estimation methods yielded somewhat 
different item parameter estimates. Differences between estimates from Gibbs sampling with 
informative priors and marginal Bayesian were relatively small, indicating the estimates from 
the methods were comparable. Both Gibbs sampling with uninformative priors and marginal 
maximum likelihood yielded very unstable item parameter estimates. 

The ability estimates and the standard errors from the memory test are presented in 
Table 4. The maximum likelihood method after MML estimation of item parameters yielded 
several unstable estimates. GS-I, GS-U, and MB/EAP yielded relatively similar results. 
Recall that normal priors were used in those three Bayes methods of ability estimation. 

It is important to note that the posterior interval from Gibbs sampling can be constructed 
not from the normal based method using the standard errors but from the sampled values. 
Figure 4 shows the trace lines of the 5,000 sampled values of Ai and G for the Gibbs sampling- 
informative. The kernel density plots can also be found in Figure 4. Since the distribution 
of the sampled values of Ai looks like a truncated normal form, it is also of interest to obtain 
the posterior interval directly from the sampled values. The 95% posterior intervals of the 
GS-I and MB are presented in Table 5. Table 6 presents the ability estimates and the 95% 
posterior intervals. It is important to notice that GS-I may yield different ability estimates 
for examinees who had the same response pattern (e.g., examinees 1 to 5). 



Insert Figure 4 and Tables 5 and 6 about here 



Method 



Simulation Conditions 

Although the example presented above is informative, it does not provide enough information 
with regard to comparative characteristics of item and ability parameter estimates of Gibbs 
sampling. A standard method for examining such characteristics is based on studies of 
parameter recovery employing simulated data (e.g., Hulin, Lissak, k Drasgow, 1982; Yen, 




13 

15 



1983). Hence, data were simulated under the following conditions; the number of examinees 
(AT = 50, 100, 200) and the number of items (n = 10, 20, 40). Due to the small sample sizes, 
informative priors were employed in the two estimation methods. The sample sizes and the 
test lengths were selected to emulate a situation in which estimation procedures and priors 
might have some impact upon item parameter estimates (e.g., Harwell & Janosky, 1991). 
Sample size and test length were completely crossed to yield nine conditions. 

For the Gibbs sampling procedure, an informative prior was used: A j ~ N( 0, 1) with the 
constraint A ; - > 0 and Cj ~ iV(0,100 2 ). For MB estimation via BILOG the default priors 
were used with EAP estimation of ability. We denote these two methods as Gibbs sampling 
and marginal Bayesian (MB) estimation. 

Data Generation 

Item response vectors were generated via the computer program GENIRV (Baker, 1982) for 
the 2PL model. The generating parameters for item discrimination were distributed with 
mean 1.00 and variance .09 (i.e., standard deviation .3), and the underlying item difficulty 
parameters were distributed normal with mean 0 and variance 1. Item discrimination and 
item difficulty parameters for the 10-, 20-, and 40-item tests are presented in Tables 7, 
8, and 9, respectively. Item discrimination and difficulty parameters were not correlated. 
The distribution of the underlying ability parameters distribution was normal (0, 1) and, 
consequently, matched to the distribution of item difficulty. One hundred replications were 
generated for each of the sample size and test length conditions. Nine hundred GENIRV 
runs were needed to obtain the data sets for the study. 



Insert Tables 7, 8, and 9 about here 



Item Parameter Estimation 

Each of the generated data sets was analyzed via the computer program BILOG (Mislevy 
& Bock, 1990) for MB, and via the computer program BUGS (Spiegelhalter et al., 1997) for 
Gibbs sampling. For example, the generated item response data set for the first replication 
of sample size 50 and test length 10 was analyzed by two different computer runs, on each 
for the MB and Gibbs sampling procedures. 
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For MB, a lognormal prior on item discrimination with mean 0 and variance .25 [i.e., 
log Qtj ~ N{ 0, ,5 2 )] was used. This is the default prior specification in BILOG for estimation 
of item parameters in the 2PL model. The ability estimates were obtained by EAP 

estimation. 

For the Gibbs sampling, an informative prior was used for Xj and an uninformative prior 
for Cj- The prior distribution for Xj was set to have a normal distribution with mean 0 
and variance 1 [i.e, Xj ~ 7V(0,1)] with range restricted to yield positive values of Xj (i.e, 
Xj > 0). The prior distribution for Q was ~ JV(0,100 2 ). The prior distribution for Xj can 
be seen as a half normal distribution or the singly truncated normal distribution (Johnson, 
Kotz, k Balakrishnan, 1994). Since Xj, without the range restriction, was sampled from a 
unit normal distribution, then E{Xj) = .798 and Var(A i ) = .363 (standard deviation .603). 
The prior distribution for Q, however, was similar to the uniform distribution defined on 
the entire real line. The priors for MB and Gibbs sampling were ‘similar but not exactly the 
same. 

Metric Transformation 

In parameter recovery studies, such as the present one, comparisons between estimates and 
the underlying parameters require that the item parameter estimates obtained from different 
calibration runs be placed on a common metric with their underlying parameters (Baker k 
Al-Karni, 1991; Yen, 1987). Parameter estimation procedures under IRT yield metrics which 
are unique up to a linear transformation. To link both sets of estimates and parameters, it 
is necessary to determine the slope and intercept of the equating coefficients required for the 
transformation. 

The estimates of the item parameters for each of the estimation procedures were placed on 
the scale of the true parameters before comparisons were made. The test characteristic curve 
method by Stocking and Lord (1983) as implemented in the computer program EQUATE 
(Baker, 1993) was used. 

Evaluation Criteria 

The evaluation of accuracy in this study involved three criteria: root mean square error 
(RMSE), bias, and correlation between estimates and parameters. The RMSE is the square 
root of the average of the squared differences between estimated and true values. For item 
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discrimination, for example, the RMSE of item j is 



{(1/i?) - «j) 2 } 1/2 . where R is 



the total number of replications (i.e, R - 100). 

It is also useful to examine the bias, B, between the expected value of the estimates and 

the corresponding parameter. The bias of the item discrimination estimates, for example, 
is given as B aj = E{a jk ) - o^, where the expectation is with regard to k = 1{1)R. This 
estimate of bias was obtained for both parameters in the model across the 100 replications. 



Results 



RMSEs for Item Parameters 

RMSEs for item parameters of the 10-, 20-, and 40-item tests are reported in Tables 10, 11, 
and 12, respectively. As sample size increased, RMSEs for both item parameters decreased. 



Insert Tables 10, 11, and 12 about here 

The average RMSEs of the 10-, 20-, and 40-item tests are reported in Tables 13, 14, and 
15, respectively. The patterns of the RMSE results were consistent across all tables. RMSE 
results are also presented graphically in Figures 5, 6, and 7. 

Insert Tables 13, 14, and 15, and Figures 5, 6, and 7 about here 



In Gibbs sampling, the RMSEs for item discrimination increased as the values of 
discrimination parameters increased. For MB, items with atj = .73 and a, = 1.00 yielded 
somewhat smaller RMSEs. Overall, MB consistently yielded smaller RMDSs than did Gibbs 
sampling. For item difficulty, the two extreme item difficulties = -1.83 and pj = 1.83 
yielded larger RMSEs for both MB and Gibbs sampling. MB also yielded consistently smaller 

RMSEs for item difficulty for all conditions. 



Bias Results for Item Parameters 

The bias statistics for item discrimination and difficulty, presented in Tables 16, 17, and 18 
for the 1 0— , 20-, and 40-item tests, appear to decrease as sample size increases. 



Insert Tables 16, 17, and 18 about here 



Tables 19, 20, and 21 summarize the average sizes of bias for different test lengths. Figures 
8 9, and 10 also present the bias results of the respective tests. Bias statistics decreased with 
an increase in sample size for item discrimination. When priors of item discriminations were 
used, it was expected that positive bias would be observed for the smaller item discrimination 
parameters (i.e., cy = .45 or cy = .73) and negative bias for the larger item discrimination 
parameters (i.e., cy = 1.27 and cy = 1.55). This shrinkage effect was observed mainly for 
MB and for Gibbs sampling, only for sample size 50. 



Insert Tables 19, 20, and 21, and Figures 8, 9, and 10 about here 



The bias patterns for item difficulty was somewhat different from the patterns for item 
discrimination. Items with negative difficulty parameters had negative bias whereas positive 
bias was observed for items with positive difficulty parameters. The same pattern was 
observed across the three test lengths. MB consistently yielded better bias results than did 
Gibbs sampling. The difference between the two methods decreased as the sample sizes 

increased. 

Correlation Results for Item Parameters 

The average correlations between true and estimated values of both item discrimination and 
item difficulty across 100 replications are given in Table 22. As sample sizes increased, the 
average correlations increased. Only minor differences occurred between the two estimation 
methods: Gibbs sampling yielded better results for item discrimination whereas MB yielded 

better results for item difficulty. 



Insert Table 22 about here 



RMSEs for Ability Parameters 



The average RMSEs for ability parameters for 50, 100, and 200 examinees are reported in 
Tables 23, 24, and 25, respectively. As test length increased, RMSEs for ability parameters 



Insert Tables 23, 24, and 25, 



and Figure 11 about here 
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decreased. 



Figure 11 summarizes the results from Tables 23, 24, and 25. When ability parameters 
were close to zero, Gibbs sampling yielded smaller RMSEs. For extreme ability parameters, 
MB yielded smaller RMSEs. RMSEs decreased around zero, that is, they were smaller 
around the mean of item difficulty parameters. RMSEs increased when ability parameters 
were not well matched with the mean of the item difficulty parameters. 

Bias Results for Ability Parameters 

Tables 26, 27, and 28 summarize the average sizes of bias from 50, 100, and 200 examinees. 
Figure 12 presents the bias results for the three sample sizes. For all sample sizes, an increase 
in test length was associated with a decrease in bias. Recall that both ability estimation 
used in Gibbs sampling and MB (i.e., EAP) employed priors for ability. It was expected that 
positive bias would be observed for the larger negative ability parameters and negative bias 
for the larger positive ability parameters. This shrinkage effect was observed, in fact, for 
all conditions. Increasing test length reduced the shrinkage effect. MB consistently yielded 
smaller bias across all conditions. 



Insert Tables 26, 27, and 28, and Figure 12 about here 



Correlation Results for Ability 

The average correlations between true and estimated values of ability parameters over 100 
replications are given in Table 29. As test lengths increased, average correlations increased. 
Differences in correlations were not associated with sample size. Gibbs sampling and MB 
yielded the same results. 



Insert Table 29 about here 



Discussion 

Previous work using Gibbs sampling and MCMC methods suggests this method may provide 
a useful alternative method for estimation of IRT parameters when small sample sizes and 
small numbers of items are used. Even though implementation of the Gibbs sampling method 
in IRT is available in several computer programs, the accuracy of the resulting estimates has 
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not been thoroughly studied. The simulation results of this study indicate that MB via 
BILOG yielded better item and ability parameter estimates than Gibbs sampling. This is 

consistent with the results reported by Baker (1998). 

The main difference between Gibbs sampling and the marginalized methods, MMLE and 
MBE, is in the way these methods obtain parameter estimates. Gibbs sampling uses the 
sample of parameter values to estimate the mean and variance of the posterior density of the 
parameter. Under MML and MB, the marginalized likelihood function and the marginalized 
posterior distribution, respectively, are maximized to obtain the marginal modes. Estimates 
of the ability parameters do not arise during the course of item parameter estimation under 
the marginalized methods. Instead, ability parameters are typically estimated after obtaining 
the item parameter estimates, under the assumption that the obtained estimates are true 
values. In the Gibbs sampling approach, ability parameters can be estimated jointly with 
item parameters as in this paper, and the method is similar, in this sense, to JML or JB. 
Note that ability can be obtained not jointly but after estimating item parameters in Gibbs 

sampling. 

The computer programs BUGS (Spiegelhalter et al„ 1997) and CODA (Best et al., 
1997) as well as the accompanying manuals are freely available over the Web. The uniform 
resource locator (URL) of the Medical Research Council Biostatistics Unit at the University 

of Cambridge is: 

http : //www . mrc-bsu . cam . ac . uk/bugs/ 

Gibbs sampling and general MCMC methods are likely to be more useful for situations 
where complicated models are employed. For example, Gibbs sampling could be usefully 
applied to the estimation of item and ability parameters in the hierarchical Bayes approach 
(Mislevy, 1986; Swaminathan & Gifford, 1982, 1985, 1986). In this study, priors were imposed 
directly on the parameters and the priors used for the Gibbs sampling and MB were not 
precisely the same. Accuracy of Gibbs sampling with different kinds of priors has not been 
investigated. This kind of research may be particularly valuable for small samples and short 

tests. 

The focus in this paper was estimation of item and ability parameters in terms of RMSE 
and bias. In addition to RMSE and bias, future studies may also consider accuracy with 
respect to the posterior intervals of the estimates. This is because of the fact that one of 
the possible advantages of using Gibbs sampling or other MCMC methods is incorporation 



of uncertainly in item parameter estimates into estimation of ability parameters (e.g. Patz 
& Junker, 1997). 

In this paper, we employed the 2PL model in the example and in the simulation section 
without addressing the problem of model selection and criticism. The model criticism for 
Gibbs sampling seems to be an important topic to investigate in future research. Also the 
evaluation of Gibbs sampling for other models including the three-parameter logistic model, 
the partial credit model, and the graded response model may provide guidelines for using 
the method under IRT. 
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Table 1 

Memory Test Data from Thissen (1982) 

Item 

Examinee 12 3 4 5 6 7 8 9 — 

1 6 6 6 0 0 0 0 0 0 

2 000000000 

3 000000000 

4 000000000 

5 000000000 

6 000000000 

7 000000001 

8 000000001 

g 000000001 

10 000000010 

11 000000010 

12 000001000 

13 000010000 

14 000010001 

15 001000000 

16 000000011 

17 000000011 

18 000000101 

19 001000010 

20 001000100 

21 0 1 0 0 0 1 0 1 0 

22 100000001 

23 100000100 

24 1 0 0 1 0 0 0 0 1 

25 000000111 

26 000001011 

27 000001011 

28 000010101 

29 000100101 

30 0001001 10 

31 010000011 

32 0 1 0 0 0 1 0 0 1 

33 010010011 

34 010000111 

35 1 0 0 0 0 1 1 1 0 

36 100110110 

37 110010010 

38 0 1 0 0 0 1 1 1 1 

39 1 1 0 0 1 1 0 1 0 

40 0 1 1 1 1 0 0 1 1 



10 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

0 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0 

1 

1 

0 

1 

1 

1 

1 



Table 2 

Starting Values for Item Parameters in the 
Three Runs of the Gibbs Sampler 



Parameter 


Run 







First 


1 


0 


Second 


10 


5 


Third 


.1 


-5 
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Table 3 

Estimated Item Parameters and Standard Errors (s.e.) of the Memory Test Items 

Margianal Maximum Likelihood 




8 

9 

10 



h. 

.671 


(s.e.) 

(.463) 


Ci 

-1.775 


(s.e.) 

(.510) 


1.416 


(.662) 


-1.753 


(.617) 


.521 


(.419) 


-2.484 


(.614) 


.700 


(.511) 


-2.264 


(.617) 


.782 


(.512) 


-1.640 


(.504) 


.827 


(.536) 


-1.669 


(.524) 


.595 


(.421) 


-1.103 


(.405) 


1.380 


(.633) 


-.163 


(.459) 


.517 

.727 


(.367) 

(.477) 


-.007 

1.270 


(.345) 

(.436) 



.793 (.615) 
27.800(22.320) 
.728 (.604) 
.843 (.667) 
1.256 (.858) 
1.733 (1.124) 
.598 (.437) 
14.520 (1.932) 
.701 (.480) 
1.040 (.647) 



-1.768 (.522) 
-16.860(14.660) 
-2.488 (.630) 
-2.275 (.622) 
-1.741 (.612) 
-1.968 (.799) 
-1.058 (.402) 
-1.629 (4.836) 
.006 (.361) 
1.353 (.494) 



1.413 (.793) 
.769 (.323) 
.906 (.409) 
.932 (.398) 
.933 (.404) 
.834 (.356) 
1.355 (.690) 
.747 (.301) 
.914 (.365) 



-1.655 (.737) 
-2.403 (.659) 
-2.208 (.635) 
-1.606 (.534) 
-1.606 (.537) 
-1.105 (.449) 
-.153 (.472) 
-.004 (.424) 
1.270 (.505) 



2.344 (1.550) 
6.066(30.895) 
.255 (1.932) 
1.395 (3.164) 
1.153 (1.519) 
.465 (.814) 
.177 (.849) 
.761 (.985) 
2.168 (1.415) 
.624 (.910) 



-5.595(13.719) 
-2.072 (1.730) 
-1.619 (.863) 
-1.979 (.951) 
-1.719 (.520) 
-1.138 (.525) 
-.647 (.588) 
1.105 (.922) 
1.046 (1.049) 



Examinee 



Table 4 

Ability Estimates and Standard Errors (s.e.) of the Memory Test 
BILOG 



BUGS 



GS-I“ 



GS-U 



~MB/EAP 



MML/ML 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 



Oi (s-e-) 



-1.167 

-1.148 

-1.148 

-1.160 

-1.144 

-.773 

-.509 

-.516 

-.516 

-.129 

-.135 

-.366 

-.379 

-.489 

-.515 

.066 

.080 

-.222 

.116 

-.241 

.478 

-.195 

-.157 

-.195 

.330 

.416 

.419 

.100 

.066 

.403 

.641 

.430 

.659 

.853 

.687 

.690 

.982 

1.189 

1.302 

1.415 



(.788) 

(.793) 

(.779) 

(.776) 

(.780) 

(.751) 

(.734) 

(.737) 

(.754) 

(.712) 

(.709) 

(.752) 

(.753) 

(.770) 

(.772) 

(.702) 

(.700) 

(.734) 

(.714) 

(.737) 

(.746) 

(.731) 

(.731) 

(.782) 

(.687) 

(.706) 

(.699) 

(.726) 

(.744) 

(.700) 

(.707) 

(.701) 

(.722) 

(.671) 

(.693) 

(.750) 

(.694) 

(.683) 

(.716) 

(•711) 



6i 


(s.e.) 


Vi 


is.e.; 


Vi 


(n K/iG\ 


-1.198 


(.728) 


-1.309 


(.738) 


—3.968 


^ . j4y j 


-1.194 


(.718) 


-1.309 


(.738) 


-3.968 


(2.549) 


-1.189 


(.723) 


-1.309 


(.738) 


-3.968 


(2.549) 


-1.196 


(.703) 


-1.309 


(.738) 


-3.968 


(2.549) 


-1.187 


(.722) 


-1.309 


(.738) 


-3.968 


(2.549) 


-.779 


(.631) 


-.840 


(.695) 


-1.873 


(1.434) 


-.557 


(.577) 


-.495 


(.666) 


-.348 


(.622) 


-.560 


(.575) 


-.495 


(.666) 


-.348 


(.622) 


-.566 


(.582) 


-.495 


(.666) 


-.348 


(.622) 


.121 


(.448) 


-.234 


(.646) 


-1.029 


(.822) 


.114 


(.461) 


-.234 


(.646) 


-1.029 


(.822) 


-.331 


(.550) 


-.414 


(.659) 


-1.259 


(.948) 


-.432 


(.563) 


-.414 


(.659) 


-.797 


(.727) 


-.520 


(.598) 


-.487 


(.665) 


-.152 


(.597) 


-.557 


(.596) 


-.485 


(.665) 


-1.476 


(1.097) 


.203 


(.408) 


.069 


(.625) 


-.070 


(.589) 


.212 


(.405) 


.069 


(.625) 


-.070 


(.589) 


-.399 


(.529) 


-.140 


(.640) 


-.281 


(.612) 


.200 


(.415) 


.077 


(.625) 


-.872 


(.754) 


-.401 


(.547) 


-.131 


(.639) 


-1.289 


(.967) 


.890 


(.396) 


.329 


(.609) 


.753 


(.328) 


-.366 


(.525) 


-.126 


(.639) 


.411 


(.491) 


-.398 


(.550) 


-.090 


(.636) 


-.215 


(.604) 


-.416 


(.560) 


-.129 


(.639) 


.568 


(.412) 


.260 


(.385) 


.385 


(.607) 


-.010 


(.583) 


.358 


(.371) 


.421 


(.605) 


.087 


(.572) 


.358 


(.375) 


.421 


(.605) 


.087 


(.572) 



-.176 

-.247 

.269 

.884 

.556 

.905 

.940 

.416 

.368 

1.024 

1.175 

1.308 

1.277 



(.477) 

(.495) 

(.410) 

(.377) 

(.522) 

(.397) 

(.415) 

(.380) 

(.391) 

(.437) 

(.489) 

(.524) 

(•540) 



.227 

.217 

.443 

.595 

.442 

.602 

.894 

.766 

.763 

.972 

1.223 

1.300 

1.519 



(.616) 

(.605) 

(.601) 

(.605) 

(.601) 

(.597) 

(.599) 

(.599) 

(.596) 

(.592) 

(.592) 

(-597) 



.197 

-.285 

.971 

.944 

1.021 

.988 

.199 

.555 

1.106 

1.033 

1.165 

1.354 



(.556) 

(.613) 

(.303) 

(.301) 

(.313) 

(.306) 

(.556) 

(.420) 

(.342) 

(.316) 

(.372) 

(•514) 



erIc 
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BEST COPY AVAILABLE 



1 

2 

3 

4 

5 

6 

7 

8 

9 



h 

.671 

1.416 

.521 

.700 

.782 

.827 

.595 

1.380 

.517 

.727 



Table 5 

Estimated Item Parameters and 95% Posterior Intervals of the Memory Test Items 



Gibbs Sampling-Informative 

Cj (Post- Interval) 



(Post. Interval) 
(.035, 1.759) 



1.775 (-2.881, -883) 

(.219, 2.803) -1.753 (-3.153, —.733) 

(.019, 1.551) -2.484 (-3.826, -1.434) 

(.033, 1.894) -2.264 (-3.597, -1.186) 

(.045, 1.936) -1.640 (-2.740, -.752) 

(.050, 2.086) -1.669 (-2.842, -.757) 

(.029, 1.613) -1.103 (-1.947, -.371) 

(.272, 2.765) -.163 (-1.089, .739) 

.027, 1.405) -007 (-.694, .670) 

(.045, 1.819) 1-270 (.492, 2.182) 



Marginal Bayesian 



A,- (Post. Interval) (j_ 

.869 (.120, 1.621) -1.760 

1.413 (-.141, 2.974) -1.655 

.769 (.136, 1.405) -2.403 

.906 (.104, 1.711) -2.208 

.932 (.152, 1.716) -1.606 

.933 (.141, 1.728) -1.606 

.834 (.136, 1.535) -1.105 

1.355 (.003, 2.714) -.153 

.747 (.157, 1.340) -.004 

,914 (.199, 1.633) 1.270 



(Post. Interval) 
(-2.856, -.664) 
(-3.100, -.210) 
(-3.695, -1.111) 
(-3.453, -.963) 
(-2.653, -.559) 
(-2.659, -.553) 
(-1.985, -.225) 
(-1.078, .772) 
(-.835, .827) 
(.280, 2.260) 



Ability 



Examinee 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 



Table 6 

Estimates and 95% Posterior Intervals of the Memory Test 



Gibbs Sampling-Informative 


e. 


Posterior Interval 


-1.167 


(-2.736, .339) 


-1.148 


' (-2.788, .334) 


-1.148 


(-2.716, .324) 


-1.160 


(-2.772, .290) 


-1.144 


(-2.732, .324) 


-.773 


(-2.366, .610) 


-.509 


(-2.027, .883) 


-.516 


(-2.037, .859) 


-.516 


(-2.075, .870) 


-.129 


(-1.589, 1.216) 


-.135 


(-1.630, 1.141) 


-.366 


(-1.943, 1.003) 


-.379 


(-1.917, 1.071) 


-.489 


(-2.081, .975) 


-.515 


(-2.089, .960) 


.066 


(-1.420, 1.408) 


.080 


(-1.359, 1.440) 


-.222 


(-1.716, 1.197) 


.116 


(-1.339, 1.533) 


-.241 


(-1.734, 1.167) 


.478 


(-1.084, 1.854) 


-.195 


(-1.695, 1.187) 


-.157 


(-1.620, 1.277) 


-.195 


(-1.765, 1.309) 


.330 


(-1.093, 1.616) 


.416 


(-1.034, 1.781) 


.419 


(-.966, 1.763) 


.100 


(-1.393, 1.508) 


.066 


(-1.419, 1.509) 


.403 


(-.970, 1.800) 


.641 


(-.747, 2.018) 


.430 


(-.974, 1.789) 


.659 


(-.839, 2.045) 


.853 


(-.486, 2.154) 


.687 


(-.681, 2.007) 


.690 


(-.813, 2.139) 


.982 


(-.379, 2.322) 


1.189 


(-.138, 2.545) 


1.302 


(-.094, 2.722) 


1.415 


(.033, 2.826) 



MML/Expected A Posteriori 
9j Posterior Interval 



-1.309 


(-2.755, .138) 


-1.309 


(-2.755, .138) 


-1.309 


(-2.755, .138) 


-1.309 


(-2.755, .138) 


-1.309 


(-2.755, .138) 


-.840 


(-2.202, .522) 


-.495 


(-1.799, .809) 


-.495 


(-1.799, .809) 


-.495 


(-1.799, .809) 


-.234 


(-1.500, 1.033) 


-.234 


(-1.500, 1.033) 


-.414 


(-1.706, .879) 


-.414 


(-1.706, .878) 


-.487 


(-1.790, .816) 


-.485 


(-1.788, .818) 


.069 


(-1.157, 1.294) 


.069 


(-1.157, 1.294) 


-.140 


(-1.394, 1.114) 


.077 


(—1.148, 1.302) 


-.131 


(-1.384, 1.122) 


.329 


(-.865, 1.524) 


-.126 


(-1.378, 1.126) 


-.090 


(-1.338, 1.157) 


-.129 


(-1.382, 1.124) 


.385 


(-.805, 1.574) 


.421 


(-.766, 1.607) 


.421 


(-.766, 1.607) 


.227 


(-.979, 1.432) 


.217 


(-.990, 1.423) 


.443 


(-.742, 1.628) 


.595 


(-.582, 1.772) 


.442 


(-.743, 1.627) 


.602 


(-.576, 1.779) 


.894 


(-.276, 2.064) 


.766 


(-.407, 1.939) 


.763 


(-.410, 1.936) 


.972 


(-.195, 2.139) 


1.223 


(.063, 2.384) 


1.300 


(.140, 2.460) 


1.519 


(.349, 2.689) 



BEST COPY AVAILABLE 
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Table 7 

Item Parameters of the 10 Item Test 



Item 


Parameter 


Q J 


£l_ 


1 


.45 


.00 


2 


.73 


-.91 


3 


.73 


.91 


4 


1.00 


-1.83 


5 


1.00 


.00 


6 


1.00 


.00 


7 


1.00 


1.83 


8 


1.27 


-.91 


9 


1.27 


.91 


10 


1.55 


.00 



Table 8 

Item Parameters of the 20 Item Test 



Item 


Parameter 




£l_ 


1 


.45 


-.91 


2 


.45 


.91 


3 


.73 


-1.83 


4 


.73 


.00 


5 


.73 


.00 


6 


.73 


1.83 


7 


1.00 


-.91 


8 


1.00 


-.91 


9 


1.00 


.00 


10 


1.00 


.00 


11 


1.00 


.00 


12 


1.00 


.00 


13 


1.00 


.91 


14 


1.00 


.91 


15 


1.27 


-1.83 


16 


1.27 


.00 


17 


1.27 


.00 


18 


1.27 


1.83 


19 


1.55 


-.91 


20 


1.55 


.91 



BEST COPY AVAILABLE 
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Table 9 

Item Parameters of the \0 Item Test 



Item 


Parameter 


a; 


£l- 


1 


.45 


-.91 


2 


.45 


.00 


3 


.45 


.00 


4 


.45 


.91 


5 


.73 


-1.83 


6 


.73 


-.91 


7 


.73 


-.91 


8 


.73 


.00 


9 


.73 


.00 


10 


.73 


.91 


11 


.73 


.91 


12 


.73 


1.83 


13 


1.00 


-1.83 


14 


1.00 


-1.83 


15 


1.00 


-.91 


16 


1.00 


-.91 


17 


1.00 


.00 


18 


1.00 


.00 


19 


1.00 


.00 


20 


1.00 


.00 


21 


1.00 


.00 


22 


1.00 


.00 


23 


1.00 


.00 


24 


1.00 


.00 


25 


1.00 


.91 


26 


1.00 


.91 


27 


1.00 


1.83 


28 


1.00 


1.83 


29 


1.27 


-1.83 


30 


1.27 


-.91 


31 


1.27 


-.91 


32 


1.27 


.00 


33 


1.27 


.00 


34 


1.27 


.91 


35 


1.27 


.91 


36 


1.27 


1.83 


37 


1.55 


-.91 


38 


1.55 


.00 


39 


1.55 


.00 


40 


1.55 


.91 



BEST COPY AVAILABLE 

o 

ERIC 
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Table 10 

Root Mean Square Errors of the 10 Item Test 









Gibbs Sampling 










Marginal 


Bayesian 






N = 


50 


N = 


100 


N = 


200 


N = 


50 


N = 


100 


N = 


200 


OLi 


Q j 


Otj 


& 




Pi 


a > . 


Pi 




ii 


Q i _ 


Pj 


l 


J 

358 




.585 


.281 


.491 


.189 


.382 


.338 


.433 


.273 


.322 


.196 


.248 


2 


357 


.573 


.305 


.418 


.231 


.298 


.242 


.404 


.219 


.294 


.177 


.239 


3 


365 


.507 


.335 


.426 


.242 


.300 


.257 


.383 


.236 


.312 


.184 


.217 


4 


381 


.861 


.372 


.679 


.290 


.524 


.245 


.487 


.260 


.422 


.222 


.375 


5 


412 


.271 


.342 


.198 


.242 


.141 


.257 


.273 


.226 


.200 


.181 


.144 


6 


472 


.343 


.370 


.206 


.269 


.163 


.311 


.337 


.255 


.208 


.206 


.165 


7 


.358 


.827 


.365 


.603 


.313 


.529 


.217 


.438 


.253 


.391 


.228 


.332 


8 


.400 


.428 


.396 


.276 


.313 


.218 


.311 


.384 


.310 


.264 


.261 


.207 


9 


.425 


.452 


.391 


.293 


.290 


.194 


.323 


.367 


.300 


.281 


.263 


.196 


10 


.420 


.260 


.361 


.149 


.330 


.124 


.425 


.266 


.374 


.161 


.316 


.130 












Table 11 




















Root Mean Square Errors of the 20 Item Test 














Gibbs Sampling 










Marginal Bayesian 








N = 


50 


N = 


= 100 


N = 


= 200 


N = 


= 50 


N = 


= 100 


N = 


= 200 
3 


Item 


Otj 


0j 


Ctj 






& 


Gj 




Zu 


£z 


ZL. 




1 


.396 


.719 


.233 


.694 


.161 


.572 


.358 


.500 


.236 


.389 


.166 


.309 


2 


.344 


.856 


.260 


.578 


.170 


.592 


.320 


.521 


.255 


.377 


.175 


.341 


3 


.377 


.842 


.299 


.727 


.186 


.531 


.281 


.499 


.220 


.387 


.141 


.313 


4 


.389 


.480 


.341 


.381 


.202 


.197 


.269 


.379 


.254 


.302 


.164 


.189 


5 


.369 


.436 


.314 


.277 


.219 


.205 


.247 


.371 


.234 


.260 


.180 


.197 


6 


.429 


1.016 


.280 


.831 


.205 


.697 


.301 


.529 


.202 


.405 


.155 


.396 


7 


.380 


.460 


.341 


.331 


.208 


.235 


.243 


.376 


.244 


.286 


.162 


.220 


8 


.378 


.388 


.333 


.326 


.246 


.239 


.248 


.356 


.242 


.291 


.199 


.209 


9 


.314 


.330 


.282 


.214 


.243 


.169 


.200 


.324 


.206 


.212 


.202 


.172 


10 


.391 


.327 


.323 


.234 


.223 


.139 


.257 


.327 


0 .231 


.232 


.181 


.143 


11 


.381 


.308 


.345 


.234 


.237 


.163 


.270 


.305 


.243 


.233 


.195 


.167 


12 


.446 


.348 


.365 


.254 


.228 


.152 


.316 


.343 


.265 


.254 


.182 


.157 


13 


.406 


.483 


.329 


.274 


.231 


.240 


.278 


.418 


.232 


.228 


.184 


.219 


14 


.425 


.716 


.292 


.354 


.215 


.226 


.269 


.432 


.206 


.299 


.170 


.213 


15 


.443 


1.034 


.432 


.744 


.292 


.360 


.336 


.672 


.353 


.533 


.258 


.336 


16 


.438 


.264 


.344 


.168 


.240 


.127 


.327 


.278 


.273 


.181 


.197 


.134 


17 


.409 


.255 


.311 


.192 


.275 


.127 


.325 


.270 


.265 


.204 


.237 


.133 


18 


.403 


.819 


.394 


.645 


.274 


.406 


.312 


.588 


.314 


.456 


.237 


.375 


19 


.426 


.335 


.442 


.279 


.340 


.178 


.436 


.360 


.408 


.283 


.314 


.192 


20 


.382 


.315 


.368 


.223 


.361 


.207 


.374 


.327 


.337 


.224 


.333 


.216 



BEST COPY AVAILABLE 



Item 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 



jV = 50 

a J il 

.351 .800 

.362 .642 

.369 .648 

.311 .838 

.380 .956 

.337 .556 

.344 .639 

.357 .531 

.338 .429 

.364 .572 

.383 .588 

.329 .980 

.415 .717 

.398 1.060 

.413 .495 

.426 .557 

.382 .326 

.356 .324 

.397 .324 

.401 .346 

.370 .331 

.365 .318 

.363 .368 

.372 .436 

.412 .510 

.343 .550 

.429 .780 

.402 .838 

.433 1.056 

.427 .362 

.402 .414 

.419 .342 

.435 .262 

.370 .398 

.419 .371 

.402 :787 

.414 .365 

.417 .234 

.398 .234 

.405 .293 



Table 12 

Root Mean Square Errors of the 40 Item Test 



Gibbs Sampling 
N = 100 
a i Pi — 

.253 .665 

.258 .461 

.221 .494 

.206 .646 

.311 .903 

.287 .425 

.283 .659 

.219 .303 

.306 .308 

.266 .566 

.240 .573 

.296 .824 

.322 .685 

.307 .649 

.316 .351 

.304 .489 

.311 .204 

.292 .255 

.259 .234 

.326 .251 

.293 .210 
.317 .238 

.267 .305 

.318 .219 

.364 .305 

.304 .351 

.337 .645 

.291 .626 

.427 .691 

.324 .231 

.311 .269 

.277 .210 

.328 .186 

.313 .257 

.373 .320 

.376 .609 

.374 .230 

.310 .162 
.341 .150 

.331 .218 



Marginal Bayesian 



N = 

OLj 


200 


5 * 

ii 


50 

_h 


N = 

a J _ 


100 

Pi 


N = 


200 

31 


.158 


.427 


.327 


.535 


.250 


.398 


.150 


.288 


.183 


.325 


.335 


.489 


.256 


.339 


.185 


.264 


.151 


.294 


.341 


.462 


.229 


.366 


.154 


.240 


.152 


.511 


.306 


.564 


.209 


.400 


.150 


.352 


.213 


.598 


.269 


.530 


.231 


.459 


.170 


.369 


.205 


.283 


.240 


.399 


.214 


.300 


.167 


.242 


.193 


.321 


.237 


.487 


.212 


.393 


.158 


.269 


.191 


.240 


.253 


.436 


.160 


.287 


.155 


.231 


.199 


.203 


.231 


.386 


.233 


.285 


.161 


.195 


.176 


.280 


.260 


.422 


.193 


.355 


.143 


.237 


.185 


.358 


.276 


.471 


.172 


.320 


.146 


.275 


.239 


.628 


.232 


.536 


.218 


.465 


.189 


.388 


.279 


.446 


.285 


.465 


.242 


.464 


.232 


.361 


.253 


.424 


.253 


.574 


.221 


.441 


.203 


.341 


.229 


.210 


.281 


.381 


.231 


.295 


.182 


.187 


.259 


.299 


.298 


.443 


.226 


.370 


.215 


.243 


.184 


.156 


.251 


.331 


.218 


.206 


.154 


.159 


.212 


.151 


.229 


.308 


.228 


.259 


.178 


.154 


.215 


.168 


.291 


.320 


.195 


.240 


.176 


.173 


.200 


.158 


.254 


.356 


.254 


.251 


.169 


.162 


.233 


.133 


.251 


.329 


.217 


.218 


.187 


.138 


.191 


.165 


.242 


.326 


.243 


.244 


.155 


.170 


.199 


.168 


.250 


.348 


.207 


.266 


.172 


.170 


.233 


.135 


.242 


.381 


.241 


.225 


.190 


.139 


.233 


.253 


.288 


.410 


.274 


.278 


.187 


.232 


.207 


.244 


.229 


.391 


.225 


.304 


.173 


.226 


.242 


.428 


.299 


.519 


.243 


.428 


.195 


.322 


.218 


.397 


.268 


.515 


.208 


.457 


.173 


.321 


.310 


.506 


.330 


.719 


.356 


.521 


.268 


.430 


.217 


.158 


.340 


.336 


.263 


.231 


.194 


.166 


.276 


.172 


.306 


.382 


.252 


.269 


.241 


.173 


.213 


.143 


.325 


.343 


.229 


.226 


.191 


.150 


.210 


.138 


.318 


.278 


.264 


.198 


.183 


.146 


.268 


.175 


.298 


.384 


.258 


.271 


.235 


.177 


.247 


.179 


.311 


.375 


.301 


.285 


.238 


.190 


.277 


.313 


.308 


.627 


.315 


.492 


.245 


.302 


.314 


.157 


.381 


.391 


.373 


.252 


.299 


.168 


.276 


.114 


.386 


.258 


.316 


.175 


.257 


.119 


.266 


.111 


.378 


.254 


.335 


.160 


.254 


.118 


.278 


.154 


.381 


.318 


.302 


.240 


.259 


.181 



best COPY available 
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Table 13 

Average Root Mean Square Errors of the 10 Item Test 





Gibbs Sampling 


Marginal Bayesian 


Parameter 


N = 50 


N = 100 


N = 200 


N = 50 


N = 100 


N = 200 


Qj - .45 


.358 


.281 


.189 


.338 


.273 


.196 


.73 


.361 


.320 


.237 


.250 


.228 


.181 


1.00 


.406 


.362 


.279 


.258 


.249 


.209 


1.27 


.413 


.394 


.302 


.317 


.305 


.262 


1.55 


.420 


.361 


.330 


.425 


.374 


.316 


Qi - -1.83 


.861 


.679 


.524 


.487 


.422 


.375 


-.91 


.501 


.347 


.258 


.394 


.279 


.223 


.00 


.365 


.261 


.203 


.327 


.223 


.172 


.91 


.480 


.360 


.247 


.375 


.297 


.207 


1.83 


.827 


.603 


.529 


.438 


.391 


.332 








Table 14 










Average Root Mean Square Errors of the 20 Item Test 






Gibbs Sampling 


Marginal Bayesian 


Parameter 


N = 50 


N = 100 


N = 200 


N = 50 


N = 100 


N = 200 


aj = .45 


.370 


.247 


.166 


.339 


.246 


.171 


.73 


.391 


.309 


.203 


.275 


.228 


.160 


1.00 


.390 


.326 


.229 


.260 


.234 


.184 


1.27 


.423 


.370 


.270 


.325 


.301 


.232 


1.55 


.404 


.405 


.351 


.405 


.373 


.324 


Qi = -1.83 


.938 


.736 


.446 


.586 


.460 


.325 


H j 

-.91 


.476 


.408 


.306 


.398 


.312 


.233 


.00 


.344 


.244 


.160 


.325 


.235 


.162 


.91 


.593 


.357 


.316 


.425 


.282 


.247 


1.83 


.918 


.738 


.552 


.559 


.431 


.386 








Table 15 










Average Root Mean Square Errors of the 40 Item Test 






Gibbs Sampling 


Marginal Bayesian 


Parameter 


N = 50 


N = 100 


N = 200 


N = 50 


N = 100 


N = 200 


a, = .45 


.348 


.235 


.161 


.327 


.236 


.160 


j 

.73 


.354 


.276 


.200 


.250 


.204 


.161 


1.00 


.390 


.308 


.224 


.263 


.230 


.184 


1.27 


.413 


.341 


.252 


.317 


.280 


.224 


1.55 


.409 


.339 


.284 


.382 


.332 


.267 


Bi = -1.83 


.947 


.732 


.494 


.572 


.471 


.375 


-.91 


.524 


.415 


.253 


.419 


.314 


.217 


.00 


.381 


.262 


.175 


.350 


.247 


.171 


.91 


.515 


.405 


.269 


.417 


.307 


.234 


1.83 


.846 


.676 


.442 


.549 


.461 


.333 



BEST COPY AVAILABLE 

o 

ERIC 



35 



Table 16 

Bias Results of the 10 Item Test 









Gibbs Sampling 










Marginal Bayesian 








N = 


50 


N = 


100 


N = 


200 


N = 


50 


N = 


100 


N = 


: 200 


Item 


a j 


Pj 


a i 


— W 


a i 


Pi 


aj 


Pj 


a J 


Pi 




Pj 


1 


.200 


-.045 


.107 


-.026 


.059 


.005 


.285 


-.034 


.214 


-.024 


.153 


-.008 


2 


.135 


-.029 


.071 


-.008 


.065 


.022 


.136 


.068 


.091 


.073 


.075 


.061 


3 


.105 


.048 


.094 


.054 


.055 


.050 


.124 


-.059 


.106 


-.027 


.070 


-.003 


4 


.054 


-.255 


.046 


-.212 


.018 


-.154 


.001 


-.143 


.006 


-.155 


-.003 


-.126 


5 


.148 


.000 


.105 


.019 


.080 


.011 


.044 


-.002 


.020 


.015 


.023 


.010 


6 


.187 


.019 


.080 


-.016 


.048 


-.009 


.076 


.012 


.002 


-.020 


-.007 


-.008 


7 


.073 


.220 


.103 


.098 


.091 


.058 


.005 


.144 


.041 


.087 


.045 


.060 


8 


.039 


-.083 


.063 


-.028 


.021 


-.036 


-.106 


-.136 


-.079 


-.096 


-.074 


-.084 


9 


-.005 


.100 


.075 


.029 


-.026 


.050 


-.136 


.127 


-.064 


.092 


-.110 


.096 


10 


-.108 


.026 


-.033 


.009 


.010 


-.018 


-.290 


.023 


-.213 


.010 


-.116 


-.021 














Table 17 
























Bias Results of the 20 Item Test 


















Gibbs Sampling 










Marginal Bayesian 








N : 


= 50 


N = 


100 


N = 


200 


N = 


50 




100 


N = 


200 


Item 


a J 


Pi 


aj 


w 




& 


a i 


Hi 




Pi 


a 2 — 


Pj— 


1 


.235 


.048 


.083 


-.102 


.034 


-.136 


.302 


.237 


.189 


.164 


.127 


.101 


2 


.176 


.015 


.095 


.087 


.040 


.094 


.266 


-.218 


.198 


-.154 


.132 


-.134 


3 


.134 


-.144 


.049 


-.181 


-.005 


-.124 


.153 


.074 


.086 


.039 


.033 


.019 


4 


.162 


.017 


.103 


-.008 


.044 


-.010 


.154 


.002 


.106 


-.013 


.055 


-.010 


5 


.132 


.041 


.100 


.031 


.057 


.016 


.133 


.029 


.105 


.023 


.066 


.012 


6 


.128 


.125 


.054 


.166 


.012 


.148 


.149 


-.182 


.087 


-.072 


.046 


-.015 


7 


.102 


-.015 


.126 


.011 


.063 


-.016 


.018 


-.020 


.048 


-.017 


.016 


-.045 


8 


.107 


-.029 


.043 


-.033 


.030 


.019 


.025 


-.048 


-.011 


-.047 


-.010 


.002 


9 


.052 


.014 


.043 


.027 


.051 


.014 


-.019 


.011 


-.020 


.027 


.008 


.014 


10 


.132 


.059 


.090 


-.022 


.047 


-.011 


.038 


.058 


.021 


-.023 


.003 


-.011 


11 


.095 


.009 


.101 


.005 


.046 


.034 


.012 


.003 


.025 


.004 


.002 


.036 


12 


.100 


.044 


.059 


.012 


.056 


-.021 


.022 


.043 


-.004 


.011 


.008 


-.021 


13 


.109 


.055 


.098 


.024 


.050 


-.012 


.029 


.057 


.026 


.051 


.008 


.011 


14 


.081 


.189 


.034 


.114 


.042 


.013 


.009 


.119 


-.021 


.126 


-.001 


.037 


15 


-.033 


-.451 


.043 


-.232 


.007 


-.087 


-.121 


-.371 


-.044 


-.247 


-.058 


-.149 


16 


.108 


.023 


.105 


.002 


.079 


.001 


-.051 


.024 


-.032 


.001 


-.007 


.002 


17 


.024 


-.007 


.034 


.005 


.060 


-.012 


-.114 


-.004 


-.086 


.006 


-.027 


-.012 


18 


-.024 


.240 


.002 


.135 


-.004 


.117 


-.126 


.235 


-.089 


.167 


-.070 


.177 


19 


-.100 


-.099 


.033 


-.040 


.027 


-.019 


-.264 


-.180 


-.117 


-.111 


-.081 


-.072 


20 


— .026 


.047 


.025 


-.026 


.037 


.021 


-.215 


.132 


-.137 


.047 


-.070 


.073 



best copy available 




36 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 



Table 18 

Bias Results of the 40 Item Test 



Gibbs Sampling 



N - 


50 


N = 


100 


N = 


200 


ay 


ft 


ay 


Pi 


a i 


ft — 


.195 


.028 


.096 


-.107 


.009 


-.103 


.190 


.054 


.107 


.005 


.060 


-.006 


.183 


.114 


.098 


.030 


.030 


.012 


.168 


-.098 


.053 


.063 


.014 


.041 


.161 


-.146 


.047 


-.229 


.022 


-.126 


.124 


-.037 


.085 


-.028 


.046 


-.007 


.082 


-.016 


.081 


-.058 


.055 


-.008 


.085 


.103 


.038 


.046 


.040 


.022 


.138 


-.034 


.115 


-.027 


.047 


-.023 


.139 


-.062 


.048 


.038 


.020 


.000 


.160 


.001 


.032 


.154 


.038 


.053 


.104 


.179 


.065 


.141 


.041 


.076 


.122 


-.132 


.057 


-.167 


.027 


-.105 


.084 


-.266 


.047 


-.118 


.047 


-.032 


.106 


-.069 


.101 


-.013 


.074 


.020 


.133 


-.097 


.047 


-.087 


.023 


-.033 


.121 


.021 


.109 


-.002 


.038 


-.001 


.095 


-.030 


.042 


-.012 


.022 


-.024 


.082 


-.013 


.063 


.016 


.051 


.000 


.157 


-.055 


.049 


-.002 


.017 


-.014 


.089 


.011 


.065 


-.019 


.066 


-.007 


.095 


.024 


.097 


.003 


.045 


-.005 


.004 


-.002 


.006 


-.049 


-.004 


-.017 


.085 


.001 


.075 


.009 


.049 


-.005 


.093 


.107 


.117 


.012 


.070 


.015 


-.035 


.177 


.041 


.061 


.009 


.061 


.139 


.140 


.086 


.064 


.016 


.100 


.102 


.170 


.032 


.144 


.053 


.004 


-.065 


-.438 


-.066 


-.273 


-.003 


-.123 


.093 


-.037 


.076 


.012 


.031 


.007 


.051 


-.055 


.053 


.006 


.065 


.030 


.029 


.013 


.059 


-.007 


.038 


.005 


.119 


.035 


.084 


.021 


.043 


.000 


.000 


.101 


.063 


.032 


.048 


-.017 


.090 


.030 


.040 


.023 


.010 


.017 


-.005 


.310 


.017 


.181 


.011 


.060 


-.009 


-.093 


.007 


-.021 


.008 


-.010 


.037 


-.022 


-.013 


.012 


.042 


.011 


.000 


-.015 


-.014 


.003 


.048 


.004 


.026 


.015 


.063 


.001 


.031 


.048 



Marginal Bayesian 



N = 


50 


N = 


100 


N = 


200 







Q i . 


Pi 




ft - 


.275 


.230 


.194 


.153 


.103 


.114 


.276 


.041 


.200 


.010 


.137 


.000 


.274 


.079 


.189 


.011 


.115 


.003 


.262 


-.281 


.165 


-.196 


.107 


-.173 


.163 


.065 


.091 


.044 


.053 


.016 


.131 


.057 


.094 


.034 


.058 


.025 


.105 


.090 


.094 


.040 


.061 


.019 


.107 


.099 


.056 


.051 


.047 


.021 


.133 


-.028 


.113 


-.031 


.053 


-.020 


.137 


-.143 


.071 


-.057 


.037 


-.034 


.155 


-.075 


.058 


.047 


.051 


.010 


.125 


-.071 


.096 


-.082 


.065 


-.058 


.050 


-.091 


.020 


-.138 


.005 


-.107 


.025 


-.142 


.009 


-.093 


.018 


-.046 


.030 


-.057 


.032 


-.036 


.031 


-.003 


.053 


-.088 


-.005 


-.088 


-.010 


-.042 


.029 


.025 


.032 


-.006 


-.003 


-.002 


.015 


-.027 


-.014 


-.018 


-.021 


-.023 


.010 


-.001 


.001 


.017 


.008 


.000 


.048 


-.056 


-.010 


-.001 


-.021 


-.013 


.011 


.008 


.001 


-.019 


.014 


-.008 


.006 


.024 


.025 


.002 


-.001 


-.006 


-.043 


-.004 


-.038 


-.043 


-.040 


-.017 


.009 


-.003 


.012 


.007 


.004 


-.006 


.023 


.095 


.048 


.038 


.025 


.039 


-.068 


.125 


-.010 


.073 


-.024 


.081 


.072 


.083 


.040 


.041 


-.004 


.097 


.040 


.093 


-.006 


.125 


.024 


.025 


-.146 


-.367 


-.118 


-.261 


-.059 


-.162 


-.053 


-.097 


-.037 


-.048 


-.046 


-.038 


-.085 


-.104 


-.062 


-.054 


-.012 


-.014 


-.110 


.013 


-.061 


-.008 


-.037 


.006 


-.041 


.040 


-.035 


.021 


-.039 


.000 


-.124 


.154 


-.063 


.102 


-.028 


.026 


-.073 


.100 


-.066 


.067 


-.058 


.061 


-.101 


.315 


-.062 


.223 


-.047 


.118 


-.198 


-.180 


-.130 


-.095 


-.090 


-.059 


-.173 


-.022 


-.172 


.012 


-.062 


.012 


-.202 


-.015 


-.159 


.004 


-.060 


.004 


-.168 


.107 


-.096 


.078 


-.069 


.100 



best copy available 



37 



Table 19 

Average Bias Results of the 10 Item Test 





Gibbs Sampling 


Marginal Bayesian 


Parameter 


II 

cn 

o 


o 

o 

11 


N = 200 


N = 50 


N 


= 100 


iV = 




.200 


.107 


.059 


.285 




.214 


. 1 53 


73 


.120 


.083 


.060 


.130 




.099 


.073 


1.00 


.116 


.084 


.059 


.032 




.017 


.015 


1.27 


.017 


.069 


-.003 


-.121 




-.072 


-.092 


1.55 


-.108 


-.033 


.010 


-.290 




-.213 


-.116 


04 — -1.83 


-.255 


-.212 


-.154 


-.143 




— .155 


-.126 


H J 

-.91 


-.056 


-.018 


-.007 


-.034 




-.012 


-.012 


.00 


-.000 


-.004 


-.003 


-.000 




-.005 


-.007 


.91 


.074 


.042 


.050 


.034 




.033 


.047 


1.83 


.220 


.098 


.058 


.144 




.087 


.060 








Table 20 












Average Bias Results of the 20 Item Test 










Gibbs Sampling 


Marginal Bayesian 


Parameter 


N =50 


II 

o 

o 


N = 200 


N = 50 


N 


= 100 


N = 200 


Otj — .45 


.206 


.089 


.037 


.284 




.194 


.130 


.73 


.139 


.077 


.027 


.147 




.096 


.050 


1.00 


.097 


.074 


.048 


.017 




.008 


.004 


1.27 


.019 


.046 


.036 


-.103 




-.063 


-.041 


1.55 


-.063 


.029 


.032 


-.240 




-.127 


-.076 


04 = -1.83 


-.298 


-.207 


-.106 


-.149 




-.104 


— .065 


H J 

-.91 


-.024 


-.041 


-.038 


-.003 




-.003 


-.004 


.00 


.025 


.007 


.001 


.021 




.005 


.001 


.91 


.077 


.050 


.029 


.023 




.018 


-.003 


1.83 


.183 


.151 


.133 


.027 




.048 


.081 








Table 21 












Average Bias Results of the 40 Item Test 










Gibbs Sampling 


Marginal Bayesian 


Parameter 


N = 50 


N = 100 


N = 200 


N = 50 


N 


= 100 


N = 200 


Q.- — .45 


.184 


.089 


.028 


.272 




.187 


.116 


.73 


.124 


.064 


.039 


.132 




.084 


.053 


1.00 


.092 


.065 


.038 


.019 




.009 


.000 


1.27 


.039 


.041 


.030 


-.092 




-.063 


-.041 


1.55 


.014 


.011 


.032 


-.185 




-.139 


-.070 


Pj = -1.83 


-.246 


-.197 


-.097 


-.134 




-.112 


-.075 


-.91 


-.047 


-.037 


-.013 


-.019 




-.012 


.000 


.00 


.013 


.002 


-.003 


.011 




.001 


-.003 


.91 


.034 


.048 


.027 


.010 




.019 


.014 


1.83 


.200 


.133 


.060 


.105 




.077 


.046 



Table 22 

Average Correlations Between Item Parameters and Estimates over 100 Replications 



Test 






Gibbs Sampling 








Marginal Bayesian 




N = 


50 


N = 


100 


N = 


200 


N = 


50 


N = 


100 


is = 


200 


^ad 


T Q& 


^ad 


T dd 


^ad 


r B& 


r ad 


T e& 


r aa 


Jlii 


^ad 


t 30 


10-Item 


.503 


.920 


.624 


.950 


.737 


.968 


.499 


.948 


.615 


.969 


.738 


.980 


20-item 


.521 


.899 


.658 


.937 


.788 


.961 


.520 


.930 


.653 


.960 


.782 


.975 


40-item 


.561 


.892 


.686 


.927 


.801 


.963 


.554 


.927 


.679 


.955 


.797 


.974 



best copy available 

o 

ERIC 



38 



Table 23 

Average Root Mean Square Errors of Ability for 50 Examinees 



6 


Gibbs Sampling 


Marginal Bayesian 


n = 10 


3 

II 

to 

o 


o 

II 

c 


n = 10 


3 

II 

to 

o 


3 

II 

o 


-2.5 


1.284 


.962 


.679 


1.059 


.745 


.500 


-2.0 


.974 


.730 


.550 


.812 


.582 


.433 


-1.5 


.726 


.572 


.434 


.646 


.508 


.386 


-1.0 


.597 


.469 


.368 


.586 


.470 


.381 


-.5 


.509 


.437 


.321 


.559 


.480 


.355 


.0 


.507 


.420 


.309 


.585 


.478 


.354 


.5 


.521 


.441 


.322 


.579 


.479 


.353 


1.0 


.574 


.493 


.370 


.566 


.494 


.371 


1.5 


.729 


.529 


.429 


.635 


.466 


.366 


2.0 


.863 


.691 


.555 


.697 


.544 


.437 


2.5 


1.248 


.961 


.696 


1.022 


.740 


.519 



Table 24 

Average Root Mean Square Errors of Ability for 100 Examinees 



0 


Gibbs Sampling 


Marginal Bayesian 


n — 10 


n = 20 


o 

II 

e 


n = 10 


3 

II 

to 

o 


n = 40 


-2.5 


1.265 


.928 


.651 


1.086 


.773 


.523 


-2.0 


.963 


.691 


.543 


.840 


.590 


.456 


-1.5 


.732 


.558 


.434 


.664 


.509 


.404 


-1.0 


.589 


.470 


.366 


.584 


.475 


.371 


-.5 


.509 


.418 


.319 


.551 


.448 


.338 


.0 


.481 


.408 


.307 


.536 


.452 


.338 


.5 


.524 


.406 


.327 


.563 


.434 


.349 


1.0 


.588 


.463 


.372 


.581 


.463 


.375 


1.5 


.737 


.560 


.428 


.676 


.511 


.394 


2.0 


.950 


.717 


.467 


.823 


.616 


.392 


2.5 


1.247 


.937 « 


.631 


1.075 


.776 


.505 



Table 25 

Average Root Mean Square Errors of Ability for 200 Examinees 



0 


Gibbs Sampling 


Marginal Bayesian 


n = 10 


3 

II 

to 

o 


o 

II 

e 


n = 10 


n = 20 


3 

II 

o 


-2.5 


1.218 


.885 


.630 


1.112 


.795 


.556 


-2.0 


.936 


.669 


.490 


.859 


.608 


.444 


-1.5 


.703 


.532 


.407 


.662 


.508 


.388 


-1.0 


.571 


.451 


.343 


.570 


.454 


.343 


-.5 


.514 


.419 


.326 


.540 


.437 


.339 


.0 


.502 


.412 


.317 


.536 


.440 


.336 


.5 


.503 


.421 


.315 


.529 


.438 


.328 


1.0 


.563 


.465 


.342 


.560 


.467 


.345 


1.5 


.701 


.542 


.406 


.663 


.516 


.386 


2.0 


.898 


.647 


.479 


.824 


.581 


.434 


2.5 


1.192 


.871 


.604 


1.091 


.776 


.527 



best copy available 




39 



Table 26 

Average Bias Results of Ability for 50 Examinees 





Gibbs Sampling 


Marginal Bayesian 


6 


n = 10 


n = 20 


n = 40 


n = 10 


n = 20 


n = 40 


-2.5 


1.233 


.892 


.597 


.987 


.633 


.353 


-2.0 


.913 


.609 


.428 


.713 


.393 


.220 


-1.5 


.591 


.392 


.257 


.427 


.219 


.086 


-1.0 


.390 


.230 


.129 


.273 


.112 


.005 


-.5 


.182 


.104 


.059 


.127 


.039 


-.006 


.0 


-.012 


-.012 


-.004 


-.014 


-.012 


-.001 


.5 


-.147 


-.135 


-.068 


-.090 


-.077 


.001 


1.0 


-.354 


-.246 


-.166 


-.244 


-.128 


-.042 


1.5 


-.600 


-.355 


-.287 


-.431 


-.178 


. -.111 


2.0 


-.763 


-.595 


-.424 


-.535 


-.375 


-.206 


2.5 


-1.191 


-.890 


-.589 


-.942 


-.625 


-.334 








Table 27 










Average Bias Results of Ability for 100 Examinees 






Gibbs Sampling 


Marginal Bayesian 


6 


n = 10 


n = 20 


n = 40 


n = 10 


n = 20 


n = 40 


-2.5 


1.214 


.844 


.560 


1.019 


.657 


.393 


-2.0 


.882 


.565 


.399 


.722 


.409 


.254 


-1.5 


.595 


.381 


.231 


.469 


.257 


.111 


-1.0 


.360 


.211 


.126 


.274 


.124 


.040 


-.5 


.140 


.090 


.078 


.092 


.042 


.036 


.0 


-.017 


.000 


-.008 


-.019 


-.000 


-.009 


.5 


-.186 


-.100 


-.063 


-.143 


-.054 


-.020 


1.0 


-.365 


-.232 


-,136 


-.278 


-.145 


-.054 


1.5 


-.584 


-.383 


-.229 


-.459 


-.257 


-.111 


2.0 


-.869 


-.581 


-.317 


-.708 


-.425 


-.170 


2.5 


-1.194 


-.869 


-.531 


- 1.000 


-.687 


-.364 








Table 28 










Average Bias Results of Ability fo 


r 200 Examinees 






Gibbs Sampling 


Marginal Bayesian 
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Appendix 



model memory; 
const 

I = 40, 

J = 10; 

var 

y[I,J], p[I,J], theta [I] , lambda [J] , zeta[J] , b[J] 
data in "memory.dat"; 
inits in "memory. in"; 

for (i in 1:1) { 
for (j in 1:J) { 

logit (p[i , j] ) <- lambda [j] *theta [i] + zeta[j] 
y [i , j] ~ dbern(p[i , j] ) ; 

> 

theta [i] ' dnorm(O.l); 

> 

for (j in 1 : J) { 

lambda [j] ” dnorm(0,l) 1(0,); 
zeta[j] ' dnorm(0, 0.0001) ; 
b[j] < zeta[j] /lambda [j] 
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